Introduction

The goal of this project is to create a machine learning model that can successfully predict whether a patient will die due to heart failure based off of some patient history and vital signs.

What is Heart Failure

Although heart failure sounds like the heart may have stopped, this is not the case. Heart failure, which is also known as congestive heart failure is a serious, incurable condition where the heart does not work properly and fails to pump blood sufficiently throughout the body for its needs. Heart failure may occur of the heart can’t fill up with enough blood or if the heart is simply too weak to properly pump.

According to the Center for Disease Control and Prevention, more than 6 million adults in the United States suffer from heart failure.

According to the National Heart, Lung, and Blood Institute (NHLBI), “Heart failure may not cause symptoms right away. But eventually, you may feel tired and short of breath and notice fluid buildup in your lower body, around your stomach, or your neck.” Heart failure can also eventually cause damage to other organs such as the liver or kidneys and lead to other conditions such as pulmonary hypertension, heart valve disease, and sudden cardiac arrest.

Although heart disease is incurable, the Mayo Clinic states that “Proper treatment can improve the signs and symptoms of heart failure and may help some people live longer,” and that “Lifestyle changes - such as losing weight, exercising, and managing stress - can improve your quality of life.”

Why Predict Death by Heart Failure

Although heart failure may be incurable, it could still be beneficial for medical professionals to predict whether a patient may develop and potentially die from heart failure. For example, if a doctor can determine with high probability that a patient may develop heart failure later in life, they may be able to inform the patient so that they can make lifestyle changes early enough to prevent the most significant symptoms.

Additionally, although the body initially tries to mask the problem of heart failure through various mechanisms such as enlarging the heart, developing more muscle mass, or pumping faster, these solutions are all temporary and in these cases, heart failure will simply progress until the onset of more serious symptoms such as fatigue or breathing problems. Since treatment can often slow down the progression of heart failure, having a machine learning model that could successfully predict a person’s chances of suffering and hence dying from heart failure would mean that we could increase early detection and likely catch more cases early on and slow the progression of the disease.

Since the data set I will use includes deaths as a result of heart failure, creating an effective machine learning model out of this data set would also allow doctors to preemptively begin treatment that may prevent the patient from dying due to heart failure.

About the Data set

Project Outline

Exploratory Data Analysis

Loading in Packages and Data

We will first begin by loading in the packages we will use for the project and the raw heart failure data.

# Loading in libraries we will be using 
library(tidyverse)
library(tidymodels)
library(ggplot2)
library(knitr)
library(corrplot)
library(ggthemes)
library(gt)
library(gtExtras)
library(rbibutils)
library(bibtex)
tidymodels_prefer()
# Read raw data into a data frame. 
heartfailure_data <- read_csv("heart_failure_clinical_records_dataset.csv")

head(heartfailure_data) %>%
  gt() %>%
  gt_theme_nytimes() %>%
  tab_header("Heart Failure Data") 
Heart Failure Data
age anaemia creatinine_phosphokinase diabetes ejection_fraction high_blood_pressure platelets serum_creatinine serum_sodium sex smoking time DEATH_EVENT
75 0 582 0 20 1 265000 1.9 130 1 0 4 1
55 0 7861 0 38 0 263358 1.1 136 1 0 6 1
65 0 146 0 20 0 162000 1.3 129 1 1 7 1
50 1 111 0 20 0 210000 1.9 137 1 0 7 1
65 1 160 1 20 0 327000 2.7 116 0 0 8 1
90 1 47 0 40 1 204000 2.1 132 1 1 8 1

The data was obtained from the Kaggle Data set “Heart Failure Prediction”, with the original data being from a study conducted by Tanvir Ahmad, Assia Munir, Sajjad Haider Bhatti, Muhammad Aftab, and Muhammad Ali Raza.

(*survive2017?)*

References